[#14909] Fix for duplicate key isue when diffing catalogs with arrays #14922
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What
Fix for duplicateKeyException thrown when diffing catalogs that contain arrays in the schema (see #14909 ). This exception is due to the
CatalogHelpers.getFullyQualifiedFieldNamesWithTypes()
method returning the same names for both the array node and also the child "items" node, causing a collision duringStream.collect()' in
CatalogHelpers.getStreamDiff()`.Please see the updated unit test and also #14460 for examples of schemas that trigger this exception with the existing code.
How
The issue was caused by
getFullyQualifiedFieldNamesWithTypes()
filtering out theFieldNameOrList
instances from the path list (so that the "items" node ends up with the same name list as the parent array node). The proposed fix here just adds a string "items" to the name list instead of ignoring that list instance altogether. However, this does introduce a new "items" node that could be included in the final Catalog diff. I honestly don't know how/where this diff is used upstream (including the frontend), so I'm not sure if this would cause any issues.Another alternative would be to just exclude either the array node or the "items" node from the result of the
getFullyQualifiedFieldNamesWithTypes()
method. The only downside to that approach is that it wouldn't be able to detect diffs in cases where an array field turned into an object field (with the same child schema), or vice versa, since the resulting field list would look the same (though this is probably an edge case).Recommended reading order
CatalogHelpers.java
CatalogHelpersTest.java
🚨 User Impact 🚨
Unknown, depending on how/where the results of
CatalogHelpers.getCatalogDiff()
are used.Close #14909